166 research outputs found
Image to Image Translation for Domain Adaptation
We propose a general framework for unsupervised domain adaptation, which
allows deep neural networks trained on a source domain to be tested on a
different target domain without requiring any training annotations in the
target domain. This is achieved by adding extra networks and losses that help
regularize the features extracted by the backbone encoder network. To this end
we propose the novel use of the recently proposed unpaired image-toimage
translation framework to constrain the features extracted by the encoder
network. Specifically, we require that the features extracted are able to
reconstruct the images in both domains. In addition we require that the
distribution of features extracted from images in the two domains are
indistinguishable. Many recent works can be seen as specific cases of our
general framework. We apply our method for domain adaptation between MNIST,
USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in
classification tasks, and also between GTA5 and Cityscapes datasets for a
segmentation task. We demonstrate state of the art performance on each of these
datasets
Beyond pairwise clustering
We consider the problem of clustering in domains where the affinity relations are not dyadic (pairwise), but rather triadic, tetradic or higher. The problem is an instance of the hypergraph partitioning problem. We propose a two-step algorithm for solving this problem. In the first step we use a novel scheme to approximate the hypergraph using a weighted graph. In the second step a spectral partitioning algorithm is used to partition the vertices of this graph. The algorithm is capable of handling hyperedges of all orders including order two, thus incorporating information of all orders simultaneously. We present a theoretical analysis that relates our algorithm to an existing hypergraph partitioning algorithm and explain the reasons for its superior performance. We report the performance of our algorithm on a variety of computer vision problems and compare it to several existing hypergraph partitioning algorithms
Detecting the Starting Frame of Actions in Video
In this work, we address the problem of precisely localizing key frames of an
action, for example, the precise time that a pitcher releases a baseball, or
the precise time that a crowd begins to applaud. Key frame localization is a
largely overlooked and important action-recognition problem, for example in the
field of neuroscience, in which we would like to understand the neural activity
that produces the start of a bout of an action. To address this problem, we
introduce a novel structured loss function that properly weights the types of
errors that matter in such applications: it more heavily penalizes extra and
missed action start detections over small misalignments. Our structured loss is
based on the best matching between predicted and labeled action starts. We
train recurrent neural networks (RNNs) to minimize differentiable
approximations of this loss. To evaluate these methods, we introduce the Mouse
Reach Dataset, a large, annotated video dataset of mice performing a sequence
of actions. The dataset was collected and labeled by experts for the purpose of
neuroscience research. On this dataset, we demonstrate that our method
outperforms related approaches and baseline methods using an unstructured loss
- …